This is to check if BMI is a confouding factor between disease and gene expressions.
The counts tables were generated previously using HTseq.sh shell script and merge.command on Hera. It removes the outlier data point from the analysis, which was patient 29. This script requires a transcript counts table. There is also a sample mapping file called patient_sample_mapping.csv which links the diagnosis the the samples. This file was most recently processed on Tue Aug 12 10:21:39 2014.
This step gets the protein coding genes only.
## [1] 35
## [1] "Intercept" "BMI1.0.25."
## [3] "BMI1.25.30." "BMI1.30.50."
## [5] "groupControl" "groupAcromegaly"
## [7] "BMI1.0.25..groupControl" "BMI1.25.30..groupControl"
## [9] "BMI1.30.50..groupControl" "BMI1.0.25..groupAcromegaly"
## [11] "BMI1.25.30..groupAcromegaly" "BMI1.30.50..groupAcromegaly"
## [1] "acro.disease.normal <- results(acromegaly.cds.bmi, contrast=list(\"BMI.30.50..groupAcromegaly\", \"BMI.0.25..groupAcromegaly\"))\nsum(acro.disease.normal$padj<.05, na.rm=TRUE)\nacro.disease.overweight <- results(acromegaly.cds.bmi, contrast=list(\"BMI.25.30..groupAcromegaly\", \"BMI.25.30..groupControl\"))\nacro.disease.obese <- results(acromegaly.cds.bmi, contrast=list(\"BMI.30.50..groupAcromegaly\", \"BMI.30.50..groupControl\"))\n#acro.bmi.combined <- merge(acromegaly.results.bmi, acro.bmi.contrast, by.x=\"row.names\", by.y=\"row.names\")\nacro.bmi.combined <- cbind(acro.disease.normal, acro.disease.overweight)\ncolnames(acro.bmi.combined)[2:7]<- gsub(\"x\",\"AcrovsControl\", colnames(acro.bmi.combined)[2:7])\ncolnames(acro.bmi.combined)[8:13]<-gsub(\"y\",\"Obese_vs_Overweight\", colnames(acro.bmi.combined[8:13]))\nacro.bmi.combined <- acro.bmi.combined[order(acro.bmi.combined$padj.bmi, na.last=TRUE),]\n#acromegaly.results.bmi <- acromegaly.results.bmi[order(acromegaly.results.bmi$padj, na.last=TRUE),]\nwrite.csv(acro.bmi.combined, \"../data/processed/acromegaly_results_GRCh37.74-adjusting_bmi.csv\")"
## [1] 177
## [1] "Intercept" "BMI1_.30.50._vs_.0.30."
## [3] "group_Cushing.s_vs_Control" "BMI1.30.50..groupCushing.s"
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 3.948 0.4963 0.3532 0.19841 1.7e-15 1.23e-15
## Proportion of Variance 0.974 0.0154 0.0078 0.00246 0.0e+00 0.00e+00
## Cumulative Proportion 0.974 0.9897 0.9975 1.00000 1.0e+00 1.00e+00
## PC7 PC8 PC9 PC10 PC11
## Standard deviation 1.09e-15 3.44e-16 1.16e-16 9.35e-17 8.56e-17
## Proportion of Variance 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
## Cumulative Proportion 1.00e+00 1.00e+00 1.00e+00 1.00e+00 1.00e+00
## PC12 PC13 PC14 PC15 PC16
## Standard deviation 6.89e-17 5.98e-17 5.79e-17 5.11e-17 4.64e-17
## Proportion of Variance 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
## Cumulative Proportion 1.00e+00 1.00e+00 1.00e+00 1.00e+00 1.00e+00
## pdf
## 2
## pdf
## 2
## pdf
## 2
Clustering
## pdf
## 2
Separate the cushing’s patients into two groups: obese and not obese to examine the effect of the disease on gene expression when adjusting for age
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## [1] 104
## [1] "Intercept" "groupControl" "groupCushing.s"
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## [1] 795
## [1] "Intercept" "groupControl" "groupCushing.s"
This step annotates the data tables with the official gene symbols.
For the R session, the package versions were:
sessionInfo()
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RColorBrewer_1.0-5 scatterplot3d_0.3-35
## [3] reshape2_1.4 ggplot2_1.0.0
## [5] biomaRt_2.20.0 DESeq2_1.4.5
## [7] RcppArmadillo_0.4.320.0 Rcpp_0.11.2
## [9] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2
## [11] IRanges_1.22.10 BiocGenerics_0.10.0
##
## loaded via a namespace (and not attached):
## [1] annotate_1.42.1 AnnotationDbi_1.26.0 Biobase_2.24.0
## [4] colorspace_1.2-4 DBI_0.2-7 digest_0.6.4
## [7] evaluate_0.5.5 formatR_0.10 genefilter_1.46.1
## [10] geneplotter_1.42.0 grid_3.1.0 gtable_0.1.2
## [13] htmltools_0.2.4 knitr_1.6 lattice_0.20-29
## [16] locfit_1.5-9.1 MASS_7.3-33 munsell_0.4.2
## [19] plyr_1.8.1 proto_0.3-10 RCurl_1.95-4.3
## [22] rmarkdown_0.2.49 RSQLite_0.11.4 scales_0.2.4
## [25] splines_3.1.0 stats4_3.1.0 stringr_0.6.2
## [28] survival_2.37-7 tools_3.1.0 XML_3.98-1.1
## [31] xtable_1.7-3 XVector_0.4.0 yaml_2.1.13